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i-C , Abstract. In the context of multiple hypotheses testing, the proportion ttq of true null hy- 

' potheses in the pool of hypotheses to test often plays a crucial role, although it is generally 

' unknown a priori. A testing procedure using an implicit or explicit estimate of this quantity 

. in order to improve its efhcency is called adaptive. In this paper, we focus on the issue of 

' False Discovery Rate (FDR) control and we present new adaptive multiple testing proce- 

\ dures with control of the FDR. First, in the context of assuming independent p- values, we 

present two new procedures and give a unified review of other existing adaptive procedures 
that have provably controlled FDR. We report extensive simulation results comparing these 
procedures and testing their robustness when the independence assumption is violated. The 

■ new proposed procedures appear competitive with existing ones. The overall best, though, is 
^ ' reported to be Storey's estimator, but for a parameter setting that does not appear to have 

a been considered before. Second, we propose adaptive versions of step-up procedures that 
, , ■ have provably controlled FDR under positive dependences and unspecified dependences of 

I the p-values, respectively. While simulations only show an improvement over non-adaptive 

Qf) . procedures in limited situations, these are to our knowledge among the first theoretically 

^ ' founded adaptive multiple testing procedures that control the FDR when the p-values are 

not independent. 

m . 
in ■ 

O \ 1 Introduction 

l> ; 

' 1.1 Adaptive multiple testing procedures 

^ ■ 

■ Spurred by an increasing number of application fields, in partilar bioinformatics, the topic of 
^ I multiple testing (which enjoys a long history in the statistics literature) has generated a renewed, 

" ■ growing attention in the recent years. For example, using microarray data, the goal is to detect 

r> I which genes (among several ten of thousands) exhibit a significantly different level of expression 

■ in two different experimental conditions. Each gene represents a "hypothesis" to be tested in the 
■ ■ ' statistical sense. The genes' expression levels fluctuate naturally (not to speak of other sources 

of fluctuation introduced by the experimental protocol), and, because they are so many genes 
to choose from, it is important to control precisely what can be deemed a significant observed 
difference. Generally it is assumed that the natural fluctuation distribution of a single gene is 
known and the problem is to take into account the number of genes involved (for more details, see 
for instance Dudoit et al., 2003). 

In this work, we focus on building multiple testing procedures with a control of the false dis- 
covery rate (FDR). This quantity is defined as the expected proportion of type I errors, that is, 
the proportion of true null hypotheses among all the null hypotheses that have been rejected (i.e. 
declared as false) by the procedure. In their seminal work on this topic, Benjamin! and Hochberg 
(1995) proposed the celebrated linear step-up (LSU) procedure, that is proved to control the FDR 
under independence between the p-values. Later, it was proved (Benjamini and Yekutieli, 2001) 
that the LSU procedure still controls the FDR when the p-values have positive dependences (or 
more precisely, a specific form of positive dependence called PRDS). Under unspecified depen- 
dences, the same authors have shown that the FDR control still holds if the threshold collection 
of the LSU procedure is divided by a factor 1 + 1/2 + • • • + 1/m, where m is the total number of 
null hypotheses to test. 



More recently, the latter result has been generalized (Blanchard and Fleuret, 2007; Blanchard 
and Roquain, 2008; Sarkar, 2008a), by showing that there is a family of step-up procedures (de- 
pending on the choice of a kind of prior distribution) that still control the FDR under unspecified 
dependences between the j?-values. 

However, all of these procedures, which are built in order to control the FDR at a level a, can be 
showed to have actually their FDR upper bounded by ttoq:, where tto is the proportion of true null 
hypotheses in the initial pool. Therefore, when most of the hypotheses are false (i.e., ttq is small), 
these procedures are inevitably conservative, since their FDR is in actuality much lower than the 
fixed target a. In this context, the challenge of adaptive control of the FDR (e.g., Benjamini and 
Hochberg, 2000; Black, 2004) is to integrate an estimation of the unknown proportion ttq in the 
threshold of the previous procedures and to prove that the FDR is still rigorously controlled by a. 

Adaptive procedures arc therefore of practical interest if it is expected that ttq is, or can 
be, significantly smaller than 1. An example of such a situation occurs when using hierarchical 
procedures (e.g., Benjamini and Heller, 2007) which first selects some clusters of hypotheses that 
are likely to contain false nulls, and then apply a multiple testing procedure on the selected 
hypotheses. Since a large part of the true null hypotheses is expected to be false in the second 
step, an adaptive procedure is needed in order to keep the FDR close to the target level. 

A number of adaptive procedures have been proposed in the recent literature and can loosely 
be divided into the following categories: 

— plug-in procedures, where some initial estimator of ttq is directly plugged in as a multiplicative 
level correction to the usual procedures. In some cases (e.g. Storey's estimator, see Storey, 
2002), the resulting plug- in adaptive procedure (or a variation thereof) has been proved to 
control the FDR at the desired level (Benjamini et al., 2006; Storey et al., 2004). A variety 
of other estimators of ttq have been proposed (e.g. Meinshausen and Rice, 2006; Jin and Cai, 
2007; Jin, 2008); while their asymptotic consistency (as the number of hypotheses tends to 
infinity) is generally established, their use in plug-in adaptive procedures has not always been 
studied. 

— two-stage procedures: in this approach, a first round of multiple hypothesis testing is performed 
using some fixed algorithm, then the results of this first round arc used in order to tune the 
parameters of a second round in an adaptive way. This can generally be interpreted as using the 
output of the first stage to estimate ttq. Different procedures following this general approach 
have been proposed (Benjamini et al., 2006; Sarkar, 2008a; Farcomeni, 2007); more generally, 
multiple-stage procedures can be considered. 

— one-stage procedures, which perform a single round of multiple testing (generally step-up or 
step-down), based on a particular threshold collection that renders it adaptive (Finner et al., 
2009; Gavrilov et al., 2009). 

In addition, some other works (Genovese and Wasserman, 2004; Storey et al., 2004; Finner et al., 
2009) have studied the question of adaptivity to the parameter ttq from an asymptotic viewpoint. 
In this framework, the more specific random effects model is - most generally, though not always 
- considered, in which p- values are assumed independent, each hypothesis has a probability ttq 
of being true, and all false null hypotheses share the same alternate distribution. The behavior 
of different procedures is then studied under the limit where the number of tested hypotheses 
grows to infinity. One advantage of this approach and specific model is that it allows to derive 
quite precise results (see Neuvial, 2008, for a precise study of limiting behaviors of central limit 
type under this model, including for some of the new procedures introduced in the present paper). 
However, we emphasize that in the present work our focus is decidedly on the nonasymptotic side, 
using finite samples and arbitrary alternate hypotheses. 

To complete this overview, let us also mention another interesting and different direction 
opened up recently, that of adaptivity to the alternate distribution. If the alternate distribution 
is known a priori, it is well-known that the optimal testing statistics are likelihood ratios between 
the null and the alternate (which can then be transformed into p- values). When the alternate is 
unknown though, one can hope to estimate, implicitly or explicitly, the alternate distribution from 
the observed data, and consequently approximate the optimal test statistics (Sun and Cai, 2007 
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proposed an asymptotically consistent approach to this end; see also Spj0tvoll, 1972, Storey, 2007) . 
Interestingly, this point of view is also intimately linked to some traditional topics in statistical 

learning such as classification and of optimal novelty detection (sec, e.g., Scott and Blanchard, 
2009). However, in the present paper we will focus on adaptivity to the parameter ttq only. 

1.2 Overview of this paper 

The contributions of the present paper are the following. A first goal of the paper is to introduce 
a number of novel adaptive procedures: 

1. We introduce a new one-stage step- up procedure that is more powerful than the standard 

LSU procedure in a large range of situations, and provably controls the FDR under indepen- 
dence. This procedure is called one-stage adaptive, because the estimation of ttq is performed 
implicitly. 

2. Based on this, we then build a new two-stage adaptive procedure, which is more powerful in 
general than the procedure proposed by Benjamini et al. (2006), while provably controlling 
the FDR under independence. 

3. Under the assumption of positive or arbitrary dependence of the p-valucs, wc introduce new 
two-stage adaptive versions of known step-up procedures (namely, of the LSU under positive 
dependences, and of the family of procedures introduced by Blanchard and Fleuret, 2007, 
under unspecified dependences) . These adaptive versions provably control the FDR and result 
in an improvement of power over the non-adaptive versions in some situations (namely, when 
the number of hypotheses rejected in the first stage is large, typically more than 60%). 

A second goal of this work is to present a review of several existing adaptive step-up procedures 
with provable FDR control under independence. For this, we present the theoretical FDR control 
as a consequence of a single general theorem for plug-in procedures, which was first established 
by Benjamini et al. (2006). Here, we give a short self-contained proof of this result that is of 
independent interest. The latter is based on some tools introduced earlier (Blanchard and Roquain, 
2008; Roquain, 2007), that aim to unify FDR control proofs. Related results and tools also appear 
independently in Finner et al. (2009); Sarkar (2008b). 

A third goal is to compare both the existing and our new adaptive procedures in an exten- 
sive simulation study under both independence and dependence, following the simulation model 
and methodology used by Benjamini et al. (2006). Concerning the new one- and two- step proce- 
dures with theoretical FDR control under independence, these are generally quite competitive in 
comparison to existing ones. However we also report that the best procedure overall (in terms of 
power, among procedures that are robust enough to the dependent case) appears to be the plug- 
in procedure based on the well-known Storey estimator (Storey, 2002) used with the somewhat 
nonstandard parameter \ = a. This outcome was in part unexpected since to the best of our 
knowledge, this fact had never been pointed out so far (the usual default recommended choice is 
A = I and turns out to be very unstable in dependent situations); this is therefore an important 
conclusion of this paper regarding practical use of these procedures. 

Concerning the new two-step procedure with theoretical FDR control under dependence, simu- 
lations show an (admittedly limited) improvement over their non-adaptive counterpart in favorable 
situations which correspond to what was expected from the theoretical study (large proportion 
of false hypotheses) . The observed improvement is unfortunately not striking enough to be able 
to recommend using these procedures in practice; their interest is therefore at this point mainly 
theoretical, in that these are to our knowledge among the first theoretically founded adaptive 
multiple testing procedures that control the FDR when the p-values are not independent. 

The paper is organized as follows: in Section 2, we introduce the mathematical framework, and 
we recall the existing non-adaptive results in FDR control. In Section 3 we deal with the setup 
of independent p-values. We expose our new procedures and review the existing ones, and finally 
compare them in a simulation study. The case of positive dependent and arbitrarily dependent 
p- values is examined in Section 4 where we introduce our new adaptive procedures in this context. 
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A conclusion is given in Section 5. Section 6 and 7 contains proofs of the results and lemmas, 
respectively. Some technical remarks and discussions of links to other work are gathered at the 
end of each relevant subsection, and can be skipped by the non-specialist reader. 

2 Preliminciries 

2.1 Multiple testing framework 

In this paper, we will stick to the traditional statistical framework for multiple testing. Let {X, X, P) 
be a probability space; we want to infer a decision on P from an observation x in X drawn from 
P . Let H be a finite set of null hypotheses for P, that is, each null hypothesis h & H corresponds 
to some subset of distributions on {X,X) and "P satisfies h" means that P belongs to this subset 
of distributions. The number of null hypotheses \Ti.\ is denoted by m, where |.| is the cardinality 
function. The underlying probability P being fixed, we denote Hq = {h £ H\F satisfies h} the 
set of the true null hypotheses and mo = |Ho| the number of true null hypotheses. We let also 
ttq := TOo/m the proportion of true null hypotheses. We stress that Hq, mo, and ttq are unknown 
and implicitly depend on the unknown P . All the results to come are always implicitly meant to 
hold for any generating distribution P . 

We suppose further that there exists a set of p-value functions p — {ph,h e H), meaning 
that each : {X,X) [0, 1] is a measurable function and that for each h £ Ho, Ph is bounded 
stochastically by a uniform distribution, that is, 

VhGHo yt € [0,1], V{ph<t) <t. (1) 

Typically, p-vahics arc obtained from statistics that have a known distribution Pq under the 
corresponding null hypothesis. In this case, if Fq denotes the corresponding cumulative distribution 
function, applying 1 — Fq to the observed statistic results in a random variable satisfying (1) in 
general. Here, wc arc however not concerned how these p- values arc constructed and only assume 
that they exist and are known (this is the standard setting in multiple testing). 

2.2 Multiple testing procedure and errors 

A multiple testing procedure is a measurable function 

R:xeX^ R{x) e P{H), 

which takes as input an observation x and returns a subset of H, corresponding to the rejected 
hypotheses. As it is commonly the case, we will focus here on multiple testing procedure based on 
p- values, that is, we will implicitly assume that R is of the form -R(p). 

A multiple testing procedure R can make two kinds of errors: a type I error occurs for h when h 
is true and is rejected by R , that is, /i e Ho Hi?. Following the Ney man- Pearson general philosophy 
for hypothesis testing, the primary concern is to control the number of type I errors of a testing 
procedure. Conversely, a type II error occurs for h when h is false and is not rejected by R, that 

is /i e n R". 

The most traditional way to control type I error is to upper bound the "Family-wise error rate" 
(FWER), which is the probability that one or more true null hypotheses are wrongly rejected. 
However, procedures with a controlled FWER are very "cautious" not to make a single error, and 
thus reject only few hypotheses. This conservative way of measuring the type I error for multiple 
hypothesis testing can be a serious hindrance in practice, since it requires to collect large enough 
datasets so that significant evidence can be found under this strict error control criterion. More 
recently, a more liberal measure of type I errors has been introduced in multiple testing (Benjamini 
and Hochberg. 1995): the false discovery rate (FDR), which is the averaged proportion of true null 
hypotheses in the set of all the rejected hypotheses: 



4 



Definition 2.1 (False discovery rate). The false discovery rate of a multiple testing procedure 
R for a generating distribution P is given by 



FDR(i?) := E 



l^nHol 
\R\ 



l{|i?|>0} 



(2) 



Remark 2.2. Throughout this paper we will use the following convention: whenever there is an 

indicator function inside an expectation, this has logical priority over any other factor appearing 
in the expectation. What we mean is that if other factors include expressions that may not be 
defined (such as the ratio ^) outside of the set defined by the indicator, this is safely ignored. 
This results in more compact notations, such as in the above definition. Note also again that the 
dependence of the FDR on the unknown P is implicit. 

A classical aim, then, is to build procedures R with FDR upper bounded at a given, fixed level 
a. Of course, if we choose R = $, meaning that R rejects no hypotheses, trivially FDR(i?) = < a . 
Therefore, it is desirable to build procedures R satisfying FDR(_R) < a. while at the same time 
having as few type II errors as possible. As a general rule, provided that FDR(i?) < a, we want to 
build procedures that reject as many false hypotheses as possible. The absolute power of a mul- 
tiple testing procedure is defined as the average proportion of false hypotheses correctly rejected, 
E n HqI] / IHqI • Given two procedures R and R' , a particularly simple sufficient condition for 
R to be more powerful than R' is when R' ii R' c R holds pointwise. We will say in this case that 
R is (uniformly) less conservative than R' . 

2.3 Self-consistency, step-up procedures, FDR control and adaptivity 

We first define a general class of multiple testing procedures called self- consistent procedures (Blan- 
chard and Roquain, 2008). 

Definition 2.3 (Self-consistency, non-increasing procedure). Let A : {0,1,..., m} R, 
Zi(0) = , be a function called threshold collection; a multiple testing procedure R is said to satisfy 
the self- consistency condition with respect to A if 



holds almost surely. Furthermore, we say that R is non-increasing if for all h & H the function 
Ph I— > |-R(p)| is non-increasing, that is if\R\ is non-increasing in each p-value. 

The class of self-consistent procedures includes well-known types of procedures, notably step- up 

and step-down. The assumption that a procedure is non-increasing, which is required in addition 
to self-consistency in some of the results to come, is relatively natural as a lower p-value means 
we have more evidence to reject the corresponding hypothesis. We will mainly focus on the step- 
up procedure, which we define now. For this, we sort the p-values in increasing order using the 
notation < • • • < P(m) and put p(o) = • This order is of course itself random since it depends 
on the observation. 

Definition 2.4 (Step- up procedure). The step-up procedure with threshold collection A is de- 



A trivial but important property of a step-up procedure is the following. 

Lemma 2.5. The step-up procedure with threshold collection A is non-increasing and self-consistent 
with respect to A . 

Therefore, a result valid for any non-increasing self-consistent procedure also holds for the corre- 
sponding step-up procedure. This will be used extensively through the paper and thus should be 
kept in mind by the reader. 



Rc{hGn\ph<A{\R\)} 



(3) 



fined as 



R = {h Cz Ti. \ Ph < P{k)}^ where k = max{0 < i < m \ p(^i-j < A{i)}. 
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Among all procedures that are self-consistent with respect to A , the step-up is uniformly less 
conservative than any other (Blanchard and Roquain, 2008) and is therefore of primary interest. 

However, to recover proccdiircs of a more general form (including step-down for instance), the 
statements of this paper will be preferably expressed in terms of self-consistent procedures when 
it is possible. 

Threshold collections arc generally scaled by the target FDR level a . Once correspondingly 
rewritten under the normalized form A(i) = af3(i)/m, we will call /? the shape function for 
threshold collection A . In the particular case where the shape function (3 is the identity function, 
the procedure is called the linear step-up (LSU) procedure (at level a). 

The LSU plays a prominent role in multiple testing under FDR control; it was the first proce- 
dure for which FDR control was proved and it is probably the most widely used procedure in this 
context. More precisely, when the p-values are assumed to be independent, the following theorem 
holds. 

Theorem 2.6. Suppose that the p-values of p = {ph,h € Ti.) are independent. Then any non- 
increasing self-consistent procedure with respect to threshold collection A{i) = ai/m has FDR 
upper bounded by ttoQ: , where ttq = mo/m is the proportion of true null hypotheses. (In particular, 
this is the case for the linear step-up procedure). Moreover, if the p-values associated to true null 
hypotheses are exactly distributed like a uniform distribution, the linear step-up procedure has FDR 
equal to ttqcx . 

The first part of this result, in the case of the LSU, was proved in the landmark paper of 
Benjamini and Hochberg (1995); the second part (also for the LSU) was proved by Benjamini and 
YekutieU (2001) and Finner and Roters (2001). 

Benjamini and Yekutieli (2001) extended the previous result about FDR control of the linear 
step- up procedure to the case of p-values with a certain form of positive dependence called positive 
regressive dependence from a subset (PRDS). We skip a formal definition for now (we will get 
back to this topic in Section 4). The extension of this result to self-consistent procedures (in the 
independent as well as PRDS cases) was established by Blanchard and Roquain (2008) and Finner 
et al. (2009). 

However, when no particular assumptions are made on the dependences between the p-values, it 
can be shown that the above FDR control is not generally true. This situation is called unspecified 
or arbitrary dependence. A modification of the LSU was first proposed in Benjamini and Yekutieli 
(2001) which was proved to have a controlled FDR under arbitrary dependence. This result was 
extended by Blanchard and Fleuret (2007) and Blanchard and Roquain (2008) (see also a related 
result of Sarkar, 2008a): it can be shown that self- consistent procedures (not necessarily non- 
increasing) based on a particular class of shape functions have controlled FDR: 

Theorem 2.7. Under unspecified dependences between the p-values o/p = {ph,h G H), consider 
(3 a shape function of the form: 



Jo 

where v is some fixed a priori probability distribution on (0, oc) . Then any self- consistent procedure 
with respect to threshold collection A{i) = aj3{i)/m, has FDR upper bounded by avro . 

To recap, in all of the above cases, the FDR is actually controlled at the level TToa instead of the 
target a. Hence, a direct corollary of both of the above theorems is that the step-up procedure with 
shape function j3*{x) = 7r^^/3(a;) has FDR upper bounded a in either of the following situations: 

- (3{x) = X when the p-values are independent or PRDS, 

- the shape function (3 is of the form (4) when the p-values have unspecified dependences. 

Note that, since tto < 1, using /?* always gives rise to a less conservative procedure than using (3 
(especially when ttq is small). However, since ttq is unknown, the shape function /3* is not directly 
accessible. We therefore will call the step-up procedure using (3* the Oracle step-up procedure 
based on shape function (3 (corresponding to one of the above cases). 




(4) 
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Simply put, the role of adaptive step-up procedures is to mimic the latter oracle in order to 
obtain more powerful procedures. Adaptive procedures are often step-up procedures using the 
modified shape function G(i , where G is some estimator of tt^^: 

Definition 2.8 (Plug-in adaptive step-up procedure). Given a level a G (0,1), a shape 
function /? and an estimator G : [0, 1] ''^ (0, oo) of the quantity ttq^ , the plug-in adaptive step-up 
procedure of shape function and using estimator G (at level a) is defined as 

R = {h G H \ ph < P(k)}, where k = max{i | pa^ < af3{i)G{p)/m}. 

The (data-dependent) function A{p,i) = af3(i)G{p)/m is called the adaptive threshold collection 
corresponding to the procedure. In the particular case where the shape function f3 is the identity 
function on R+, the procedure is called an adaptive linear step-up procedure using estimator G 

(and at level a). 

Following the previous definition, an adaptive plug-in procedure is composed of two different 
steps: 

1. Estimate ttq^ with an estimator G . 

2. Take the step-up procedure of shape function G/3 . 

A subclass of plug-in adaptive procedures is formed by so-called tiuo-stage procedures, when the 
estimator G is actually based on a first, non-adaptive, multiple testing procedure. This can ob- 
viously be possibly iterated and lead to multi-stage procedures. The distinction between generic 
plug-in procedures and two-stage procedures is somewhat informal and generally meant only to 
provide some kind of nomenclature between different possible approaches. 

The main theoretical task is to ensure that an adaptive procedure of this type still correctly 
controls the FDR. The mathematical difficulty obviously comes from the additional random vari- 
ations of the estimator G in the procedure. 

3 Adaptive procedures with provable FDR control under independence 

In this section, we introduce two new adaptive procedures that provably control the FDR under 
independence. The first one is one-stage and does not include an explicit estimator of tt^^ , hence 
it is not a plug-in procedure. We then propose to use this as the first stage in a new two-stage 
procedure, which constitutes the second proposed method. 

For clarity, we first introduce the new one-stage procedure; we then discuss several possible 
plug- in procedures, including our new proposition and several procedures proposed by other au- 
thors. FDR control for these various plug-in procedures can be studied using a general theoretical 
device introduced by Benjamini et al. (2006) which we reproduce here with a self-contained and 
somewhat simplified proof. Finally, to compare these different approaches, we close this section 
with extensive simulations which both examined the performance under independence and the 
robustness under (possibly strong) positive correlations. 

3.1 New adaptive one-stage step- up procedure 

We present here our first main contribution, a one-stage adaptive step-up procedure. This means 
that the estimation step is implicitly included in the shape function /3 . 

Theorem 3.1. Suppose that the p-values ofp= (ph, h G H) are independent and let A G (0, 1) be 
fixed. Define the adaptive threshold collection 

^W=-in((l-A)^^,A). (5) 

Then any non-increasing self-consistent procedure with respect to A has FDR upper bounded by 
a . In particular, this is the case of the corresponding step-up procedure, denoted by BR-IS-X . 
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The above result will be proved in Section 6. Our proof is in part based on Lemma 1 of 
Benjamin! et al. (2006). Note that an alternate proof of Theorem 3.1 has been established in 
Sarkar (2008b) without using this lemma while nicely connecting the FDR upper-bound to the 
false non-discovery rate. 

Below, we will focus on the choice A = a , leading to the threshold collection 

Aii)=amm(^{l-a) ^_'._^^ ,iy (6) 
For i < (m + l)/2, the threshold (6) is 

n-a)i 

' 

m — t + 1 

and thus our approach differs from the threshold collection of the standard LSU procedure thresh- 
old by the factor 

^ m—i-\-l 

It is interesting to note that the correction factor „^^^]^ appears in Holm's step-down procedure 
(Holm, 1979) for FWER control. The latter is a well-known improvement of Bonferroni's procedure 
(which corresponds to the fixed threshold a/m), taking into account the proportion of true nulls, 
and defined as the step-down procedure^ with threshold collection a/ (m — i + 1) . Here we therefore 
prove that this correction is suitable as well for the linear step-up procedure, in the framework of 
FDR control. 

If r denotes the final number of rejections of the new one-stage procedure, we can interpret 
the ratio between the adaptive threshold and the LSU threshold at the same point as an 

a posteriori estimate for Tr,^^ . In the next section we propose to use this quantity in a plug-in, 
2-stagc adaptive procedure. 

As Figure 1 illustrates, our procedure is generally less conservative than the (non-adaptive) 
linear step- up procedure (LSU). Precisely, the new procedure can only be more conservative than 
the LSU procedure in the marginal case where the factor is smaller than one. This happens 

only when the proportion of null hypotheses rejected by the LSU procedure is positive but less 
than a + l/m (and even in this region the ratio of the two threshold collections is never less than 
(1 — a) ). Roughly speaking, this situation with only few rejections can only happen if there are 
few false hypotheses to begin with (ttq close to 1) or if the false hypotheses are very difficult to 
detect (the distribution of false p- values is close to being uniform). 

In the interest of being more specific, we briefly investigate this issue in the next lemma, 
considering the particular Gaussian random effects model (which is relatively standard in the 
multiple testing literature, see e.g. Genovese and Wasserman, 2004) in order to give a quantitative 
answer from an asymptotical point of view (when the number of tested hypotheses grows to 
infinity). In the random effect model, hypotheses are assumed to be randomly true or false with 
probability ttq , and the false null hypotheses share a common distribution Pi . Globally, the p- 
values then are i.i.d. drawn according to the mixture distribution 7ro?7[0, 1] -|- (1 — 7ro)Pi . 

Lemma 3.2. Consider the random effects model where the p-values are i.i.d. with common cumu- 
lative distribution function t not + (1 — TTo)F{t). Assume the true null hypotheses are standard 
Gaussian with zero mean and the alternative hypotheses are standard Gaussian with mean fj, > . 
In this case F{t) = «?(<? {t) — ij) , where # is the standard Gaussian upper tail function. Assuming 
TTo < (1 -|- a)~^ , define 

11* = ^ (a^) - ^ a-' . 

\ l-TTO J 

^ The stop-down procedure with threshold collection A rejects the hypotheses corresponding to the k 
smallest p-values, where k = max{0 <i<m\yj<i p(j) < A[j)}. It is self-consistent with respect 
to A but uniformly more conservative than the step-up procedure with the same threshold collection, 
compare with Definition 2.4. 
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Fig. 1. For m = 1000 null hypotheses and a = 5%: comparison of the new threshold collection BR-IS-X 
given by (5) to that of the LSU, the AORC and FDR09-r] . 

Then if fx > ji* , the probability that the LSU rejects a proportion of null hypotheses less than 
1/m + a tends to as m tends to infinity. On the other hand, ifno > {l + a)~^ , or fi < i^* , then 
this probability tends to one. 

For instance, taking in the above lemma the values ttq = 0.5 and a = 0.05, results in the critical 
value /X* ~ 1.51. This lemma delineates clearly in a particular case in which situation we can 
expect an improvement from the adaptive procedure over the standard LSU. 

Comparison to other adaptive one-stage procedures. Very recently, other adaptive one- 
stage procedures with important similarities to BR-IS-X have been proposed by other authors. 
(The present work was developed independently.) 

Starting with some heuristic motivations, Finner et al. (2009) proposed the threshold collection 
^('') ^ m-{i-a)i ' "^i^ich they dubbed the asymptotically optimal rejection curve (AORC). However, 
the step-up procedure using this threshold collection as is does not have controlled FDR (since 
t{m) = 1 , the corresponding step- up procedure would always reject all the hypotheses), and several 
suitable modifications were proposed by Finner et al. (2009), the simplest one being 



which is denoted by FDR09-r] in the following. 

The theoretical FDR control proved in Finner et al. (2009) is studied asymptotically as the 
number of hypotheses grows to infinity. In that framework, asymptotical control at level a is 
shown to hold for any 77 < 1. On Figure 1, we represented the thresholds BR-IS-X and FDR09- 
T] for comparison, for several choices of the parameters. The two families appear quite similar, 
initially following the AORC curve then branching out or capping at a point depending on the 
parameter. One noticeable difference in the initial part of the curve is that while FDR09-r] exactly 
coincides with the AORC, BR-IS-X is arguably sligthly more conservative. This reflects the nature 
of the corresponding theoretical result - non- asymptotic control of the FDR requires a somewhat 



t'^{i) = mm(t{i),r] ^ai/m), 
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more conservative threshold as compared to the only asymptotic control of FDR-rj . Moreover, we 
can use BR-IS-X as a first step in a 2-step procedure, as will be argued in the next section. 

The ratio between BR-lS-\ and the AORC (before the capping point) is a factor which, 
assuming a > (m + 1)"-^ , is lower bounded by (1 — A) (1 — ;jq:x) ■ This suggests that the value for 
A should be kept small, this is why we propose X = a as a default choice. 

Finally, the step-down procedure based on the same threshold collection t(i) (without modifi- 
cation) is proposed and studied by Gavrilov et al. (2009). Using specific properties of step-down 
procedures, these authors proved the nonasymptotic FDR control of this procedure. 



3.2 Adaptive plug-in methods 

In this section, we consider different adaptive step-up procedures of the plug-in type, i.e. based 

on an explicit estimator of ttq^ . We first review a general method proposed by Benjamini et al. 
(2006) in order to derive FDR control for such plug-in procedures (see also Theorem 4.3 of Finner 
et al., 2009). We propose here a self-contained proof of this result, which is somewhat more 
compact than the original one (and also extends the original result from step-up procedures to 
more generally self-consistent procedures). Based on this result, we review the different plug-in 
estimators considered by Benjamini et al. (2006) and add a new one to the lot, based on the 
one-stage adaptive procedure introduced in the previous section. 

Let us first introduce the following notations: for each h we denote by p-h the collection 
of p-values p restricted to W \ {h} , that is, p-h = {Ph',h' ^ h) . We also denote po,h = {P-h, 0) 
the collection p where ph has been replaced by 0. 

Theorem 3.3 (Benjamini, Krieger, Yekutieli 2006). Suppose that the family p-values p = 

{ph . h E Ti) is independent. Let G : [0, 1]^ (0, oo) be a measurahle. coordinate-wise non- 
increasing function. Consider a non-increasing multiple testing procedure R which is self- consistent 
with respect to the adaptive linear threshold collection A{p,i) = aG{p)i/m . Then the following 
holds: 

FDR(i?)<- V E[G(po,/.)] . (7) 
m /—^ 

In particular, if for any h gHq , it holds that E [G(po,fi)] < Tr,^^ > then FDR(i?) < a . 

We will apply the above result to the following estimators, depending on a fixed parameter 
Ae (0,1) or fco e {l,...,m}: 



[Storey- A] Gi(p) 



(1 - A)m 



Quant — G2 p = , ; 

[BKY06-A1 G3(p) = ~ — 7 , where Rq is the standard LSU at level A ; 

m- |i?o(p)| + 1 

[BR-2S-A] G4(p) = ''^"^^"^ . , where R'^ is BR-IS-A (see Theorem 3.1). 

Above, the notations "Storey- A", "Quant-^", "BKY06-A" and "BR-2S-A" refer to the plug-in 
adaptive linear step-up procedures associated to G\, G2, G3 and G4, respectively. 

Estimator Gi is usally called modified Storey's estimator and was initially introduced by Storey 
(2002) from an heuristics on the p-values histogram (originally without the "-t-1" in the numerator, 
hence the name "modified"). Its intuitive justification is as follows: the set Sx of p-values larger 
than the threshold A contains on average at least a proportion (1 — A) of the true null hypotheses. 
Hence, a natural estimator of tTq^ is (1 — A)m/|6'A flTiol — (1 ^ X)rn/\Sx\ ~ Gi(p) . Therefore, we 
expect that Storey's estimator is generally an overestimate of tt^^^ . A standard choice is A = 1/2 
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(as in the SAM software of Storey and Tibshirani, 2003). FDR control for the corresponding plug- 
in step-up procedure was proved by Storey et al. (2004) (actually, for the modification A{p,i) = 
mm{aGi{p)i/m, A) ) and by Benjamini ct al. (2006). 

Estimator G2 was introduced by Benjamini and Hochberg (2000) and Efron et al. (2001), from 
a slope heuristics on the p-values c.d.f. Roughly speaking, G2 appears as a Storey's estimator with 
the data-dcpcndcnt choice A = P(ko) j ^^id can therefore be interpreted as the quantilc version of 
the Storey estimator. A standard value for ko is [m/2j, resulting in the so-called median adaptive 
LSU (see Benjamini et al. (2006) and the references therein). 

Estimator G3 was introduced by Benjamini et al. (2006) for the particular choice A = a/{l + a). 
More precisely, a slightly less conservative version, without the "-|-1" in the denominator, was used 
in Benjamini et al. (2006). We forget about this refinement here, noting that it results only in a 
very slight improvement. 

Finally, the estimator G4 is new and follows exactly the same philosophy as G3, that is, uses 
a step- up procedure as a first stage in order to estimate ttq^ , but this time based on our adaptive 
one-stage step-up procedure introduced in the previous section, rather than the standard LSU. 
Note that since R'q is less conservative than Rq (except in marginal cases), we generally have 
G2 < G3 pointwise and our estimator improves over the one of Benjamini et al. (2006). 

These different estimators all satisfy the sufficient condition mentioned in Theorem 3.3, and 
we thus obtain the following corollary: 

Corollary 3.4. Assume that the p-values of p = {ph,h £ H) are independent. For i = 1,2,3,4, 
and any h G Hq , it holds that E [Gj(po,/i] < . Therefore, the plug-in adaptive linear step-up 
procedure at level a using estimator Gi has FDR smaller than a . 

The above result for Gi, G2 and G3 (for A = a/{l-\-a)) was proved by Benjamini et al. (2006). 
For completeness, we reproduce shortly the corresponding arguments in the appendix. 

In other words, Corollary 3.4 states that under independence, for any A and ko, the plug-in 
adaptive procedures Storey-A, Quant--^, BKY06-A and BR-2S-A all control the FDR at level a. 

Remark 3. 5. The result proved by Benjamini et al. (2006) is actually slightly sharper than Theo- 
rem 3.3. Namely, if G(-) is moreover supposed to be coordinate-wise left-continuous, it is possible 
to prove that Theorem 3.3 still holds when po,^, in the RHS of (7) is replaced by the slightly 
better ph = {p-h,Ph{P-h)) , defined as the collection of p-values p where ph has been replaced 
by Ph{P-h) = max |p € [0,1] | p < a7r(ft,)|f?(p_^,p)|G(p_^,p)}. This improvement then per- 
mits to get rid of the "-|-1" in the denominator of G3 . Here, we opted for simplicity and a more 
straightforward statement, noting that this improvement is not crucial. 

Remark 3.6. The one-stage step- up procedure of Finner et al. (2009) (see previously the discussion 
at the end of Section 3.1) — for which there is no result proving non-asymptotic FDR control up 
to our knowledge — can also be interpreted intuitively as an adaptive version of the LSU using 
estimator G2 , where the choice of parameter A;o is data-dependent. Namely, assume we reject at 
least i null hypotheses whenever is lower than the standard LSU threshold times the estimator 

G2 wherein parameter k^ = i is used. This corresponds to the inequality < , which, 

solved in , gives the threshold collection of Finner et al. (2009). Remember from Section 3.1 
that this threshold collection must actually be modified in order to be useful, since it otherwise 
always leads to reject all hypotheses. The modification leading to FDROQ-rj consists in capping the 
estimated -k^^ at a level 77 , i.e. using min(r^, G2) instead of G2 in the above reasoning. In fact, the 
proof of Finner et al. (2009) relies on a result which is essentially a reformulation of Theorem 3.3 
for a specific form of estimator. 

Remark 3.7. The estimators G,, i = 1, 2, 3, 4 are not necessarily larger than 1, and to this extent 
can in some unfavorable cases result in the final procedure being actually more conservative than 
the standard LSU. This can only happen in the situation where either ttq is close to 1 ("sparse 
signal") or the alternative hypotheses are difficult to detect ("weak signal"); if such a situation is 
anticipated, it is more appropriate to use the regular non-adaptive LSU. 
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For the Storey-A estimator, we can control precisely the probability that such an unfavorable 
case arises by using Hoeffding's inequality (Hoeffding, 1963): assuming the true nulls are i.i.d. 
uniform on (0, 1) and the false nulls i.i.d. of c.d.f. F{-), we write by definition of Gi 



'[Gi(p)<l] 



^ 711 

- ^ (1 {p;, > A} - P [p;, > A]) > (1 - 7ro)(F(A) - A) 



m 



hen 



< exp(-2(mc^ + 1)), 



(8) 



where we denoted c = (1 — 7ro)(-F'(A) — A) , and assumed additionally c > . The behavior of 
the bound mainly depends on c, which can get small only if ttq is close to 1 (sparse signal) or 
F(A) is close to A (weak signal), illustrating the above point. In general, provided c > does not 
depend on m, the probability that the Storey procedure fails to outperform the LSU vanishes 
exponentially as m ^ oo . 



3.3 Theoretical robustness of the adaptive procedures under maximal dependence 

For the different procedures proposed above, the theory only provides the correct FDR control 
under independence between the j3-values. An important issue is to know how robust this control 
is when dependences are present (as it is often the case in practice). However, the analytic com- 
putation of the FDR under dependence is generally a difficult task, and this issue is often tackled 
empirically through simulations in a pre-specified model (we will do so in Section 3.4). 

In this short section, we present theoretical computations of the FDR for the previously intro- 
duced adaptive step- up procedures, under the maximally dependent model where all the p- values 
are in fact equal, that is ph = pi for all h £ H (and TOq = m). It corresponds to the case where 
we perform m times the same test, with the same p- value. Albeit relatively trivial and limited, 
this case leads to very simple FDR computations and provides at least some hints concerning the 
robustness under dependence of the different procedures studied above. 

Proposition 3.8. Suppose that we observe m identical p-values p = {pi, ■■■,Pm) = {Pit ■■■■iPi) 

with pi ~ C^([0, 1]) and assum,e m. = m.Q. Then, the following holds: 

FDR{BR-1S-X) 
FDR{FDR09-r]) 

FDR{Storey-X) 

FDR( Quant-ko / m) 
FF>R{BKY06-X) 

Interestingly, the above proposition suggests specific choices of the parameters A, r] and ko to 
ensure control of the FDR at level a under maximal dependence: 

• For BR-IS-A, putting A2 = a/{a + m.-^), Proposition 3.8 gives that FBR{BR-1S-X) = X 
whenever A < A2. This suggests to take X = a, and is thus in accordance with the default 
choice proposed in Section 3.1. 

• For FDR09-J7, no choice of 77 < 1 will lead to the correct FDR control under max;imal depen- 
dence. However, the larger r] , the smaller the FDR in this situation. Note that FDR{FDR09-^) = 
2a. 

• For Storey-A, BKY06-A and BR-2S-A, putting Ai = a/{l + a + m-^), we have FDR = A for 
Ai < A < A2. This suggests to choose X = a within these three procedures. Furthermore, note 
that the standard choice A = 1/2 for Storey-A leads to a very poor control under maximal 
dependence: FDR(5'<ore2/-l/2) = min(am, l)/2. 



= min(A, (1 — X)am), 

= min ^A, a(l - X)rnj + (a(l - A)(l -|- m"^) - A)^ , 



(l + a)- {ka - l)m-i' 
FI)R(BR-2S-X) = FBRiStorey-X). 
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• For Quant-fco/m, we see that the value of ko maximizing the FDR while maintaining it 
below a is fco = [am\ + 1. Remark also that the standard choice ko = [m/2\ leads to 
FDR{Quant-ko/m) = 2a/{l + 2a + 2m-^) ~ 2a. 

Nevertheless, we would like to underline that the above computations should be interpreted 
with caution, as the maximal dependence case is very specific and cannot possibly give an accurate 
idea of the behavior of the different procedures when the correlation between the p-values are 
strong but not equal to 1 . For instance, it is well-known that the LSU procedure has FDR far 
below a for strong positive correlations, but its FDR is equal to a in the above extreme model (see 
Finner et al., 2007, for a comprehensive study of the LSU under positive dependence). Conversely, 
the FDR of some adaptive procedures can be higher under moderate dependence than under 
maximal dependence. This behavior appears in the simulations of the next section, illustrating the 
complexity of the issue. 

3.4 Simulation study 

How can we compare the different adaptive procedures defined above? For a fixed A, we have 

pointwise Gi > G4 > G3 , which shows that the adaptive procedure [Storey- A] is always less 
conservative than [BR-2S-A], itself less conservative than [BKY06-A] (except in the marginal cases 
where the one-stage adaptive procedure is more conservative than the standard step-up procedure, 
as delineated earlier for example in Lemma 3.2). It would therefore appear that one should always 
choose [Storey- A] and disregard the other ones. However, a important point made by Benjamin! 
et al. (2006) for introducing G3 as a better alternative to the (already known earlier) Gi is that, on 
simulations with positively dependent test statistics, the plug- in procedure using Gi with A = 1/2 
had very poor control of the FDR, while the FDR was still controlled for the plug-in procedure 
based on G3. While the positively dependent case is not covered by the theory, it is of course very 
important to ensure that a multiple testing procedure is sufficiently robust in practice so that the 
FDR does not vary too much in this situation. 

In order to assess the quality of our new procedures, we compare here the different methods 
on a simulation study following the setting used by Benjamini et al. (2006). Let Xi = Hi + Si, for 
i,l < i < m, where e is a R™-valued centred Gaussian random vector such that E(£?) = 1 and 
for i j, EieiSj) = p, where p G [0,1] is a correlation parameter. Thus, when p = the X^'s 
arc independent, whereas when p > the X^'s are positively correlated (with a constant pairwisc 
correlation). For instance, the e^'s can be constructed by taking Si := U + ^1 — p Zi, where 
Zi, I < i < m and U are all i.i.d A^(0, 1). 

Considering the one-sided null hypotheses /i, : "/j,/ < 0" against the alternatives "/ij > 0" for 
1 < i < m, we define the p- values pi — <l>{Xi), for 1 < i < m, where <!> is the standard Gaussian 
distribution tail. We choose a common mean jl for all false hypotheses, that is, for 1 < i < toq, 
Hi = and for i,mo + 1 < i < m, fXi = p.; the p- values corresponding to the null means follow 
exactly a uniform distribution. 

Note that the case p = 1 and m = mo (i.e. ttq = 1) corresponds to the maximally dependent 
case studied in Section 3.3. 

We compare the following step-up multiple testing procedures: first, the one-stage step-up 
procedures defined in Section 3.1: 

- [BR08-lS-a] The new procedure of Theorem 3.1, with parameter X = a, 

- [FDROQ-^] The procedure proposed in Finner et al. (2009) and described in Section 3.1, with 

Second, the adaptive plug-in step-up procedure defined in Section 3.2: 

- [Median LSU] The procedure [Quant- ^] with the choice ^ = 5 , 

- [BKY06-a] The procedure [BKY06-X] with the parameter choice A = a , 

- [BR08-2S-a] The procedure [BR08-2S-X] with the parameter choice A = a , 

- [Storey-X] With the choices A = 1/2 and A = a . 
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Finally, we used as oracle reference [LSU Oracle], the step- up procedure with the threshold col- 
lection A{i) = ai/mo, using a perfect estimation of ttq. 

The parameter choice X = a for [Storey-X] comes from the relationship of G^,G4 to Gi in 
Section 3.1, and form the discussion of the maximally dependent case in Section 3.3. Note that the 
procedure studied by Benjamini et al. (2006) is actually [BKY06-a/(l + a)] in our notation (up to 
the very slight modification explained in Remark 3.5). This means that the procedure [BKY06-a\ 
used in our simulations is not exactly the same as in Benjamini et al. (2006), but it is very close. 

The three most important parameters in the simulation are the correlation coefficient p, the 
proportion of true null hypotheses ttq, and the alternative mean p, which represents the signal-to- 
noise ration, or how easy it is to distinguish alternative hypotheses. We present in Figures 2, 3, 
and 4 results of the simulations for one varying parameter (ttq, jl and p, respectively), the others 
being kept fixed. Reported are, for the different methods: the average FDR, and the average power 
relative to the reference [LSU-Oracle]. The absolute power is defined as the average proportion of 
false null hypotheses rejected, and the relative power as the mean of the number of true rejections 
of the procedure divided by the number of true rejections of [LSU-Oracle]. Each point is an average 
of 10^ simulations, with fixed parameters m = 100 and a = 5% . 

Under independence (p = 0) Remember that under independence of the p-values, the LSU 

procedure has FDR equal to ottq and that the LSU Oracle procedure has FDR equal to a (provided 
that a < ttq). The other procedures have their FDR upper bounded by a (in an asymptotical 
sense only for [FDR09-i]). 

The situation where the p- values arc independent corresponds to the first row of Figures 2 and 
3 and the leftmost point of each graph in Figure 4. It appears that in the independent case, the 
following procedures can be consistently ordered in terms of (relative) power over the range of 
parameters studied here: 

[Storey-l/2\ y [Storey-a] y [BR08-2S-a] y [BKY06-a], 

the symbol 'V' meaning "is (uniformly over our experiments) more powerful than" . 

Next, the procedures [median-LSU] and [FDR09-^] appear both consistently less powerful 
than [Storey-i], and [FDR09-i] is additionally also consistently less powerful than [Storey-a]. 
Their relation to the remaining procedures depends on the parameters; both [median-LSU] and 
[FDRO9-5] appear to be more powerful than the remaining procedures when ttq > 5, and less 
efficient otherwise. We note that [median-LSU] also appears to perform better when p, is low (i.e. 
the alternative hypotheses are harder to distinguish). 

Concerning our one-stage procedure [BR08-lS-a], we note that it appears to be indistinguish- 
able from its two-stage counterpart [BR08-2S-a] when ttq > | , and significantly less powerful 
otherwise. This also corresponds to our expectations, since in the situation ttq < 5 , there is a 
much higher likelihood that more than 50% hypotheses are rejected, in which case our one-stage 
threshold family hits its "cap" at level a (sec e.g. Fig. 1; a similar qualitative explanation applies 
to understand the behavior of FDRQ9 — 1/2). This is precisely to improve on this situation that we 
introduced the 2-stage procedure, and we see that does in fact improve substantially the 1-stage 
version in that specific region. 

The fact that [Storey-^] is uniformly more powerful than the other procedures in the indepen- 
dent case corroborates the simulations reported in Benjamini et al. (2006). Generally speaking, 
under independence we obtain a less biased estimate for ttq^ when considering Storey's estimator 
based on a "high" threshold like A = ^ . Namely, higher values are less likely to be "contam- 
inated" by false null hypotheses; conversely, if we take a lower threshold A, there will be more 
false null hypotheses included in the set of p-values larger than A , leading to a pessimistic bias 
in the estimation of tt^^^ . This qualitative reasoning is also consistent with the observed behav- 
ior of [median-LSU] , since the set of p- values larger than the median is much more likely to be 
"contaminated" when ttq < ^ . 

However, the problem with [Storey-^] is that the corresponding estimation of n^^ exhibits 
much more variability than its competitors when there is a substantial correlation between the 
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p- values. As a consequence it is a very fragile procedure. This phenomenon was already pinpointed 
in Benjamini et al. (2006) and we study it next. 

Under positive dependences (p > 0) Under positive dependences, remember that it is known 
theoretically from Benjamini and Yekutieli (2001) that the FDR of the procedure LSU (resp. 
LSU Oracle) is still bounded by aTTo (resp. a), but without equality. However, we do not know 
from a theoretical point of view if the adaptive procedures have their FDR upper bounded by 
a. In fact, it was pointed out by Farcomeni (2007), in another work reporting simulations on 
adaptive procedures, that one crucial point for these seems to be the variability of estimate of 
TT^^. Estimates of this quantity that are not robust with respect to positive dependence will result 
in failures for the corresponding multiple testing procedure. 

The situation where the p- values are positively dependent corresponds to the second and third 
rows (p = 0.2, 0.5 , respectively) of Figures 2 and 3 and to all the graphs of Figure 4. 

The most striking fact is that [Storey- does not control the FDR at the desired level any 
longer under positive dependences, an can even be off by quite a large factor. This is in accordance 
with the experimental findings of Benjamini et al. (2006). Therefore, although this procedure was 
the favorite in the independent case, it turns out to be not robust, which is very undesirable for 
practical use where it is generally impossible to guarantee that the j)- values are independent. The 
procedure [median-LSU] appears to have higher power than the remaining ones in the situations 
studied in Figure 3, especially with a low signal-to-noise ratio. Unfortunately, other situations 
appearing in Figures 2 and 4 show that [median-LSU] can exhibit a poor FDR control in some 
parameter regions, most notably when ttq is close to 1 and positive dependences are present (see, 
e.g., Fig. 4, bottom row). In a majority of practical situations, it is difficult to rule out a priori that 
ttq could be close to 1 (i.e., there is only a small proporion of false hypotheses), or that there are 
no dependences. For these reasons, our conclusion is that [median-LSU] is also not robust enough 
in general to be reliable. 

The other remaining procedures seem to still have a controlled FDR, or at least to be very 
close to the FDR target level (except for [FDR09-^] when p and ttq are close to 1). For these 
it seems that the qualitative conclusions concerning power comparison found in the independent 
case remain true. To sum up: 

— the best overall procedure seems to be [Storey-a] : its FDR seems to be tmder or only slightly 
over the target level in all situations, and it exhibits globally a power superior to other proce- 
dures. 

— then come in order of power, our 2-stage procedure [BR08-2S-a], then [BKY06-Q:]. 

— like in the dependent case, [FDR09-^] ranks second when ttq > | but tends to perform no- 
ticeably poorer if ttq gets smaller. Its FDR is also not controlled if very strong correlations are 
present. 

To conclude, the rcx'onimendation that wc draw from these experiments is that for practical use, 
we recommend in priority [Storey-a], then as close seconds [BR08-2S-a] or [FDR09-^] (the latter 
when it is expected that ttq > 1/2, and that there are no very strong correlations present). The 
procedudrc [BKYOG-a] is also competitive but appears to be in most cases noticeably outperformed 
by the above ones. These procedures all exhibit good robustness to dependence for FDR control 
as well as comparatively good power. The fact that [Storey-a] performs so well and seems to hold 
the favorite position has up to our knowledge not been reported before (it was not included in the 
simulations of Benjamini et al., 2006) and came somewhat as a surprise to us. 

Remark 3.9. As pointed out earlier, the fact that [FDR09-i] performs sub-optimally for ttq < ^ 
appears to be strongly linked to the choice of parameter 77 = i . Namely, the implicit estimator 
of ttq"^ in the procedure is capped at rj (see Remark 3.6). Choosing a higher value for rj will 
reduce the sub-optimality region but increase the variability of the estimate and thus decrease the 
overall robustness of the procedure (if dependences are present; and also under independence if 
only a small number of hypotheses m are tested, as for this procedure the convergence of the FDR 
towards its asymptotically controlled value becomes slower as ry grows towards 1). 
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Fig. 2. FDR and power relative to oracle as a function of the true proportion ttq of null hypotheses . Target 
FDR is a = 5% , total number of hypotheses m = 100 . The mean for the alternatives is /Z = 3. From top 
to bottom: pairwise correlation coefficient p G {0,0.2,0.5}. 
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Fig. 3. FDR and power relative to oracle as a function of the common alternative hypothesis mean p, . 
Target FDR is a = 5% , total number of hypotheses m = 100 . The proportion of true null hypotheses is 
TTo = 0.5. Prom top to bottom: pairwise correlation coefficient p £ {0, 0.2, 0.5}. 
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Remark 3.10. Another 2-stage adaptive procedure was introduced in Sarkar (2008a), which is very 
similar to a plug- in procedure using [Storey- A]. In fact, in the experiments presented in Sarkar 
(2008a), the two procedures arc almost equivalent, corresponding to A = 0.995 . Wc decided not to 
include this additional procedure in our simulations to avoid overloading the plots. Qualitatively, 
we observed that the procedures of Sarkar (2008a) or [Storey-0.995] are very similar in behavior 
to [Storey-i]: very performant in the independent case but very fragile with respect to deviations 
from independence. 

Remark 3.11. One could formulate the concern that the observed FDR control for [Storey-a] could 
possibly fail with other parameters settings, for example when ttq and/or p are close to one. We 
performed additional simulations to this regard, which wc summarize briefly here. Wc considered 
the following cases: ttq = 0.95 and varying p G [0, 1] ; p = 0.95 and varying tt S [0, 1] ; finally (ttq, p) 
varying both in [0.8, 1]^ , using a finer discretization grid to cover this region in more detail. In 
all the above cases Storcy-a still had its FDR very close to (or below) a. Note also that the case 
p ~ 1 and TTo — 1 is in accordance with the result of Section 3.3, stating that Y'D^{Storey-a) = a 
when p = 1 and ttq = 1 . Finally, we also performed additional experiments for different choices of 
the number of hypotheses to test (m = 20 and m = 10^) and different choices of the target level 
(a = 10%, 1%). In all of these cases were the results qualitatively in accordance with the ones 
already presented here. 

4 New adaptive procedures with provable FDR control under 
eirbitrary dependence 

In this section, we consider from a theoretical point of view the problem of constructing multiple 
testing procedures that are adaptive to ttq under arbitrary dependence conditions of the p-values. 
The derivation of adaptive procedures that have provably controlled FDR under dependences 
appears to have been only studied scarcely (see Sarkar, 2008a, and Farcomeni, 2007). Here, we 
propose to use a 2-stage procedure where the first stage is a multiple testing with either controlled 
FWER or controlled FDR. The first option is relatively straightfoward and is intended as a refer- 
ence. In the second case, we use Markov's inequality to estimate tTq^ . Since Markov's inequality 
is general but not extremely precise, the resulting procedures are obviously quite conservative 
and are arguably of a limited practical interest. However, we will show that they still provide an 
improvement, in a certain regime, with respect to the (non-adaptive) LSU procedure in the PRDS 
case and with respect to the family of (non-adaptive) procedures proposed in Theorem 2.7 in the 
arbitrary dependences case. 

For the purposes of this section, we first recall the formal definition for PRDS dependence of 
Benjamini and Yekutieli (2001): 

Definition 4.1 (PRDS condition). Remember that asetD c [0, 1]^ is said to he non- decreasing 
if for all x,y G [0,1]^, x < y coordinate-wise and x G D implies y G D. Then, the p-values 
P = iph, h € H) are said positively regressively dependent on each one from Ho (PRDS on Ho in 
short) if for any non-decreasing measurable set D c [0, 1]^ and for all h e Ho, u e [0, 1] ^ P(p e 
D\ph = u) is non- decreasing. 

On the one hand, it was proved by Benjamini and Yekutieli (2001) that the LSU still has controlled 
FDR at level Trga (i.e.. Theorem 2.6 still holds) under the PRDS assumption. On the other hand, 
under totally arbitrary dependences this result does not hold, and Theorem 2.7 provides a family 
of threshold collection resulting in controlled FDR at the same level in this case. 

Our first result concerns a two-stage procedure where the first stage Ro is any multiple testing 
procedure with controlled FWER, and where we (over-) estimate mo via the straightforward 
estimator (m — |-Ro|) • This should be considered as a form of baseline reference for this type of 
two-stage procedure. 

Theorem 4.2. Let Ro be a non-increasing multiple testing procedure and assume that its FWER 
is controlled at level ao , that is, P [Ro fl Ho 7^ 0] < qo • Then the adaptive step-up procedure R 
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with data- dependent threshold collection A{i) = ai{m — \Ro\) has FDR controlled at level 

ao + ai in either of the following dependence situations: 

— the p-values {ph, h gH) are PRDS on Hq and the shape function is the identity function. 

— the p-values have unspecified dependences and p is a shape function of the form (4)- 

Here it is clear that the price for adaptivity is a certain loss in FDR control for being able to use 
the information of the first stage. If we choose = cti = a/2 , then this procedure will outperform 
its non-adaptive counterpart (using the same shape function) only if there are more than 50% , 
rejected hypotheses in the first stage. Only if it is expected that this situation will occur does it 
make sense to employ this procedure, since it will otherwise perform worse than the non-adaptive 
procedure. 

Our second result is a two-stage procedure where the first stage has controlled FDR. First 
introduce, for a fixed constant k>2, the following function: for x G [0, 1], 



If i?o denotes the first stage, we propose using i^^(|i?o|) as an (under-)estimation of tTq ^ at the 
second stage. We obtain the following result: 

Theorem 4.3. Let be a fixed shape function, and ao,ai e (0, 1) such that ao < ai. Denote 

by i?o the step-up procedure with threshold collection Ao{i) = aoP{i)/m. Then the adaptive step- 
up procedure R with data- dependent threshold collection Ai{i) = ai(5{i)FK,{\RQ\/ m) / m has FDR 
upper bounded by ai + Kao in either of the following dependence situations: 

— the p-values {ph, h G H) are PRDS on Ho and the shape function is the identity function. 

— the p-values have unspecified dependences and (5 is a shape function of the form (4). 

For instance, in the PRDS case, the procedure R of Theorem 4.3 with k = 2, ao = a/i and 
ai = a/2, is the adaptive linear step- up procedure at level a/2 with the following estimator for 



where |i?o| is the number of rejections of the LSU procedure at level a/4 and (•)+ denotes the 
positive part. 

Whether in the PRDS or arbitrary dependences case, with the above choice of parameters, we 
note that R is less conservative than the non-adaptive step-up procedure with threshold collection 
A{i) = aP{i)/m if F2(|i?o| / |W|) > 2 or cquivalcntly when Ro rejects more than F2^{2) = 62,5% 
of the null hypotheses. Conversely, R is more conservative otherwise, and wc can lose up to a 
factor 2 in the threshold collection with respect to the standard one-stage version. Therefore, 
here again this adaptive procedure is only useful in the cases where it is expected that a "large" 
proportion of null hypotheses can easily be rejected. In particular, when we use Theorem 4.3 in 
the distribution-frc!c c;ase, it is relevant to choose the shape function /3 from a prior distribution i/ 
concentrated on the large numbers of {!,..., m}. Finally, note that it is not immediate to see if 
this procedure will improve on the one of Theorem 4.2. Namely, with the above choice parameters, 
it has to reject more hypotheses in the first step than the procedure of Theorem 4.2 in order to 
beat the LSU, and the first step is performed at a smaller target level. However, since the first 
step only controls the FDR, and not the FWER, it can actually be much less conservative. 

To explore this issue, we performed limited experiments in a favorable situation to test the two 
above procedures, i.e. with a small ttq . Namely, we considered the simulation setting of Section 3.4 
with p = 0.1, mo = 100 and m = 1000 (hence tto = 10%) and a = 5% . The common value p, of the 
positive means varies in the range [0, 5] . Larger values of p, correspond to a very large proportion 
of hypotheses that are easy to reject, which favors the first stage of the two above procedures. 





-1 . 



1 



1- V(2|i?o|/m-l)+ 
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A positively correlated family of Gaussians satisfies the PRDS assumption (see Benjamini and 
Yekutieli (2001)) , so that we use the identity shape function (linear step- up), and compare our 
procedures against the standard LSU. For the FWER-controUcd first stage of Theorem 4.2, we 
chose a standard Holm procedure Holm (1979), which is a step-down procedure with threshold 
family t{i) = am/{m — i + 1) . In Figure 5, we report the average relative power to the oracle 
LSU, and the False Non-discovery Rate (FNR), which is the converse of the FDR for type II 
errors, i.e., the average of the ratio of non-rejected false hypotheses over the total number of non- 
rejected hypotheses. Since we are in a situation where ttq is small, the FNR might actually be a 
more relevant criterion than the raw power: in this situation, because of the small number of non 
rejected hypotheses, two different procedures could have their power very similar and close to 1, 
but noticeably different FNRs. 

The conclusion is that there exists an (unfortunately relatively small) region where the adaptive 
procedures improve over the standard LSU in terms of power. In terms of FNR, the improvement 
is more noticeable and over a larger region. Finally, our two-step adaptive procedure of Theo- 
rem 4.3 appears to outperform consistently the baseline of Theorem 4.2. These results arc still 
unsatisfying to the extent that the adaptive procedure improves over the non-adaptive one only 
in a region limited to some quite particular cases, and underperforms otherwise. Nevertheless, 
this demonstrates theoretically the possibility of provably adaptive procedures under dependence. 
Again, this theme appears to have been theoretically studied in only a handful of previous works 
until now, and improving significantly the theory in this setting is still an open challenge. 



Average relative power to [LSU-oracle] False Non-discovery Rate (FNR) 




Fig. 5. Relative power to oracle and false non-discovery rate (FNR) of the different procedures, as a 
function of the common alternative hypothesis mean pt . Parameters are a = 5% , m = 1000 , tto = 10% , 
p — 0.1 . "BR08-dcp-Holm" corresponds to the procedure of Theorem 4.2 using ai = ao — a/2 and Holm's 
step-down for the first step, and "BR08-dep" to the procedure of Theorem 4.3 with k = 2, ao = a/4 and 
ai = a/2 . The shape function /3 is the identity function. Each point is an average over 10* independent 
repetitions. 



Remark 4-4- Some theoretical results for two-stage procedures under possible dependences using 
a first stage with controlled FWER or controlled FDR appeared earlier (Farcomeni, 2007). How- 
ever, it appears that in this reference, it is implicitly assumed that the two stages are actually 
independent, because the proof relics on a conditioning argument wherein FDR control for the 
second stage still holds conditionally to the first stage output. This is the case for example if the 
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two stages are performed on separate families of p-values corresponding to a new independent 
observation. Here we specifically wanted to take into account that we use the same collection of 

p- values for the two stages, and therefore that the two stages cannot assumed to be independent. 
In this sense our results arc novel with respect to those of Farcomcni (2007). 

Remark 4- 5- The theoretical problem of adaptive procedures under arbitrary dependences was 
also considered by Sarkar (2008a) using two-stage procedures. However, the procedures proposed 
there were reported not to yield any significant improvement over non-adaptive procedures. In fact, 
in the explicit procedures proposed by Sarkar (2008a), it can be seen that there exists a function 
of the form (4) such that the second stage is always more conservative (and sometimes by a 
large factor) than the non-adaptive step-up procedure with threshold collection A{i) = a(3(i)/m, 
which has FDR bounded by 7roa(see Theorem 2.7). 

5 Conclusion and discussion 

We proposed several adaptive multiple testing procedures that provably control the FDR under 
different hypotheses on the dependence of the ^>-values. Firstly, we introduced the one- and two- 
stage procedures BR-IS and BR-2S and we proved their theoretical validity when the p- values are 
independent. The procedure BR-2S is less conservative in general (except in marginal situations) 
than the adaptive procedure proposed by Benjamini et al. (2006). Extensive simulations showed 
that these new procedures appear to be robustly controlling the FDR even in a positive dependence 
situation, which is a very desirable property in practice. This is an advantage with respect to the 
[Storey-|] procedure, which is less conservative but breaks down under positive dependences. 
Moreover, our simulations showed that the choice of parameter \ = a instead of A = 1/2 in the 
Storey procedure resulted in a much more robust procedure under positive dependences, at the 
price of being slightly more conservative. This fact is supported by a theoretical investigation of 
the maximally dependent case. These properties do not appear to have been reported before, and 
put forward Storey-a as a procedure of considerable practical interest. 

Secondly, we presented what we think is among the first examples of adaptive multiple testing 
procedures with provable FDR control in the PRDS and distribution-free cases. An important 
difference with respect to earlier works on this topic is that the procedures we introduced here are 
both theoretically founded and can be shown to improve on non-adaptive procedures in certain 
(admittedly limited) circumstances. Although their interest at this point is mainly theoretical, this 
shows in principle that adaptivity can improve performance in a theoretically rigorous way even 
without the independence assumption. 

The proofs of the results have been built upon the notion of self- consistency and other technical 
tools introduced in a previous work (Blanchard and Roquain, 2008). We believe these tools allow 
for a more unified approach than in the classical adaptive multiple testing literature, avoiding in 
particular to deal explicitly with the reordered p- values, which can be somewhat cumbersome. 

Another advantage of this approach is that it can be extended in a relatively straightforward 
manner to the case of weighted FDR, that is, the quantity (2) where the cardinality measure |.| 
has been replaced by a general measure W{R) = Ylh^R.'^h (with W{T-L) — '}2hen'^h = m). This 
allows in particular to recover results very similar to those of Benjamini and Heller (2007) and can 
also be used to prove that a (generalized) Storey estimator can be used to control the weighted 
FDR. The modifications needed to include this generalizations are relatively minor; we omit the 
details here and refer the reader to Blanchard and Roquain (2008) to see how the case of weighted 
FDR can be handled using the same technical tools. 

There remains a vast number of open issues concerning adaptive procedures. We first want to 
underline once more that the theory for adaptive procedures under dependence is still underde- 
veloped. It might actually be too restrictive to look for procedures having theoretically controlled 
FDR uniformly over arbitrary dependence situations such as what we studied in Section 4. An 
interesting future theoretical direction could be to prove that some of the adaptive procedures 
showing good robustness in our simulations actually have controlled FDR under some types of 
dependence, at least when the p- values are in some sense not too far from being independent. 
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6 Proofs of the results 



6.1 Proofs for Section 3 

The following proofs use the notations po,h and p-h defined at the beginning of Section 3.2. 

Proof of Theorem 3.1. Let R denote a non- increasing self-consistent procedure with respect to 
A defined in (5); by definition R satisfies 

flc{.e«|p.<,ni„((l-A)-^!l|^,A)}^ 

Therefore, we have 

FDR(i?) = ^ IE 



l{p/><a-A);;rqa^} " 

\Rip)\ 

I^(P)I 



heHo 



1 

The second inequality above comes from |i?(p)| < |i?(po,?i)|, which itself holds because \R\ is 
coordinate- wise non- increasing in each p- value. The last inequality is obtained with Lemma 7.1 of 
Section 7 with U = ph, g{U) = \R{p-h, U)\ and c = ^_|fl(pp^°)|_|_i ; because the distribution of ph 
conditionnally to p-h is stochastically lower bounded by a uniform distribution, \R\ is coordinate- 
wise non-increasing and because po./i depends only on the p-values of p-h- Finally, since the 
threshold collection of R is upper bounded by A, we get 

(1 - A)E [m/(m - \R{po,h)\ + 1)] < EGi(po,?.), 

where Gi is the Storey estimator with parameter A. We then use EGi(po,ft,) < t^q^ (see proof of 
Corollary 3.4) to conclude. ■ 



i{heR{p)} 
\Rip)\ 



< EE 

heuo 



< EE 

heHo 



lip,, < n-x) 



I^(p)l 



P-h 



< (1 - A)a ^ E 



Proof of Lemma 3.2. Denote G(t) — not + {1 — ttq) F (t) the cdf of the p- values under the random 
effects mixture model. Let us denote by tm the threshold of the LSU procedure. The proportion 
of rejected hypotheses from the initial pool is then exactly Gm{tm) , where G„i is the empirical 
cdf of the p-values. It was proved by Genovese and Wasserman (2002) under the random effects 
model, that as m tends to infinity the LSU threshold tm converges in probability to t*, which is 
the largest point t E [0, 1] such that G{t) = a^^t . Since Gm converges in probability uniformly 
to G, we deduce that the proportion of rejected hypotheses converges to a~^t* in probabihty; 
hence, if t* > , the probability that the proportion of rejected hypotheses is less that a + l/m 
converges to zero; and conversely converges to 1 if t* < . 

The definition of t* and the expression for G in the Gaussian mean shift model imply the 
following relation whenever t* > 0: 
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It is easily seen that if ttq < (1 + a)~^ , the quantity n* in the statement of the lemma is well 
defined and we have t* > o? for /u > /x*. This gives the first part of the result. 

Conversely, if tto > (1 + a)"^ wc have f* = , and if ttq < (1 + ol)~^ but jj. < fj,* , we have 
t* < a-^ ; this leads to the second part of the result. ■ 



Proof of Theorem 3.3. By definition of self-consistency, the procedure R satisfies 

Rc{hen\ph< a\R\G{p)/m}. 

Therefore, 



FDR(i?) = X! 

heHo 



i{/iei?(p)} 



l^(p)l 



< 

heHo 



l{ph<a\R{p)\G{p)/m} 

\Rip)\ 



Since G is non-increasing, we get: 

'l{ph<a\Rip)\G{po,h)/m} 



FDR(i?) < E ^ 

heHo 



\R{p)\ 

l{ph<a\Rip)\G{po,h)/m} 



\Rip)\ 



p-h 



< - V EG(po,^). 

heHo 



The last step is obtained with Lemma 7.1 of Section 7 with U = ph, g{U) = \R{p-h^U)\ and 
c = aG'(po,/i)/m, because the distribution of ph conditionnally to p^;; is stochastically lower 
bounded by a uniform distribution, \R\ is coordinate-wise non-increasing and po,ft, depends only 
on the p- values of p_/j. ■ 



Proof of Corollary 3.4. We prove that the sufficient condition of Theorem 3.3 holds for the 
nonincreasing estimators Gi, i ~ 1,2, 3, 4. To that end, we reproduce here without major changes 
the arguments used by Bcnjamini et al. (2006). The bound for Gi is obtained using Lemma 7.4 
(see below) with k = mo and g = 1 — A: for all /i e Ho, 



]E[Gi(po,h)] <m(l-A)E 



J2 1 {Ph' > A} + 1 

h'eHo\{h} 



-In 



The proof for G3 and G4 is deduced from the one of Gi because G3 < G4 < Gi pointwisc. 

Let us prove that EG2(poji) < ttq"^, for any h & Ho and any ko S {1, ...,m}. If ko < mi + 1, 
the result is trivial. Suppose now ko > mi + 1. Introduce the following auxiliary notation: for p 
a family of p-values indexed by H. , and a subset B C H. denote by S{i,p,B) the i-th ordered 
p-value of the subfamily {x'f^)h,'eB ■ Pointwise, G2 can be rewritten as: 



G2{P0,h) 



< 



m, 


m + l- 


ko 


m 




m+1- 


ko 


m 




m+1- 


ko 



1 - S{ko,Po,h,T-C) 
l-S{ko-l,p,H\{h}) 

l-S{ko- 1 - m + mo,p,no\ {h}) ), 



the latter coming from the relation S{i, p. A) > S{i — \A \ B\,p, B), for every finite sets B C A 
and integer i > \A\B\. Therefore, using that mo — 1 independent random variables with marginal 



24 



distributions stochastically lower bounded by a uniform law have a j-largest value on average 
larger than j/mo, we obtain: 

m+l-fcoV "^0 / 



Proof of Proposition 3.8. Let us first consider adaptive one-stage procedures: for any step-up 
procedure R of threshold A{i) = a(3{i)/m we easily derive that the probability that R makes any 
rejection is 

P [3i I Pi < A{i)] =F[3i\pi< A{i)] =F[pi< A{m)] = A{m), 

which is FDR(i?) because mo = m. The results for BR-lS-\ and FDROQ-rj follow. 

With the same reasoning, we find that for any plug-in adaptive linear step-up procedure R 
that uses an estimator G(p), 

FDR(i?) = P [pi < aG(p)] . (10) 

Next, for the Storey plug-in procedure, we have ■■■,p{) = (1 — X)m/{ml {pi > A} -|- 1), so 

that applying (10), we get 

FDR{Storey-X) = P [pi < aGi(p)] 

= P [pi < A, pi < a(l - A)m] + P[pi> A, pi < a{l - \)m/(ra + 1)] 

For the quantile procedure, we have 

(y. 

P bi < a(l - Pi)m/(m - fco + 1)] = P [pi((l + a)m - fco + 1) < am] = — -— . 

1 -I- a — (fco — l)/m 

For the BKY06 procedure, we simply remark that since the linear step-up procedure of level A 
rejects all the hypotheses when pi < X and rejects no hypothesis otherwise, the estimator Gi and 
G3 are equal in this case. The proof for BR-2S-X is similar. ■ 

6.2 Proofs for Section 4 

We begin with a technical lemma that will be useful for proving both Theorem 4.2 and 4.3. It is 
related to techniques previously introduced by Blanchard and Roquain (2008). 

Lemma 6.1. Assume R is a multiple testing procedure satisfying the self- consistency condition: 

Rc{he n\ph < aGip)Pi\R\)/m} , 
where G(p) is a data- dependent factor. Then the following inequality holds: 

IRdTio 



FDR{R) <a-\-E , 

\H\ 

under either of the following conditions: 



^-l{\R\>Q}l{G{p)>no'} 



(11) 



the p-values (ph, h € 7i) are PRDS on TIq , R is non-increasing and is the identity function, 
the p-values have unspecified dependences and P is a shape function of the form (4). 
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Proof. We have 



FDR(ii) = E 
= E 



IRnHo 



\R\ 



< ^E 

heHo 



■1{\R\ > 0} 
■l{|it'|>0}l{G<7ro-i} 
E 



■E 



\RnHo\ 



\R\ 



l{\R\>0}l{G>Tro'} 



l{Ph<a/3{\R\)/mo} 



\R\ 



\Rnno\ 
\R\ 



l{|i?|>0}l{G> V} 



The desired conclusion will therefore hold if we establish that for any h gHo , and c > : 

'i{Ph<cp{\R\)y 



E 



\R\ 



< c. 



In the distribution-free case, this is a direct consequence of Lemma 7.3 of Section 7 with U = ph 
and V = P{\R\)- For the PRDS case, we note that since |-R(p)| is coordinate-wise nonincreasing in 
each p-value, for any t; > 0, D = {z e [0, 1]^ | |i?(z)| < i>} is a measurable non- decreasing set, so 
that the PRDS property impUes that u P(|-R| < "v \ ph = u) \s non-dccrcasing. This implies that 
u H- » P(|i?| < V \ Ph < u) hy the following argument (see also Lehmann, 1966, cited by Benjamini 
and Yekutieli, 2001, and Blanchard and Roquain, 2008): putting 7 = P [p/i < u | Pfc < m'] , 

P [p G £> I p,, < m'] = E [P [p G D I p^] \ph< u] 

= 7E[P[pe£'|pfc] \ph<u] + {l-^)¥.[¥[p€D\ph] \u<ph<u'] 
>M^[^€D\ph]\ph<u]=¥[p€D\ph<u]. 



We can then apply Lemma 7.2 of Section 7 with U = ph and V = \R\. 



Proof of Theorem 4.2. By definition of a step- up procedure, the two-stage procedure R satisfies 
the assumption of Lemma 6.1 for G(p) = (1 — ^^)~^ , where Rq is the first stage with FWER 
controlled at level ag . Furthermore, it is easy to check that \R\ is nonincreasing as a function of 
each p- value (since |i?o| is)- Then, we can apply Lemma 6.1, and from inequality (11) we deduce 



FDR{R) < ai + E 



IRnT-Lo 



\R\ 



11 



< ai + P [iio n Wo 7^ 0] 



m 



In the case where i?o rejects all hypotheses, we assumed implicitly that the second stage also does. 



Proof of Theorem 4.3. Assume ttq > (otherwise the result is trivial). By definition of a 
step-up procedure, the two-stage procedure R satisfies the assumption of Lemma 6.1 for G(p) = 
FkHRqI/iti,) , where Rq is the first stage. Furthermore, it is easy to check that \R\ is nonincreasing 
as a fmiction of each p-value (since |i?o| is). Then, we can apply Lemma 6.1, and from inequality 
(11) we deduce 



FDR(i?) < ai + E 



\Rnno\ 



\R\ 



l{F,(|i?o|/m) >7ro-i} 



< ai + moE 



l{F4\Ro\/m)>Tr^'} 

\Ro\ 
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For the second inequality, we have used the two fohowing facts: 

(i) F^{\Ro\/m) > TT^^ imphes |i?o| > 0, 

(ii) because of the assumption ao < ai and i^K > 1 , the output of the second step is necessarily 
a set containing at least the output of the first step. Hence > |i?o| • 

Let us now concentrate on further bounding this second term. For this, first consider the gen- 
eralized inverse of Fk. , F~^{t) = inf {x \ Fk{x) > t} . Since F^ is a non-decreasing left-continuous 
function, we have Ffi{x) > t x > F~^{t) . Furthermore, the expression of F~^ is given by: 
Vt e [1, +00), F~'^{t) = K~^t~'^ - + 1 (providing in particular that F~'^{'Kq^) > 1 - ttq). Hence 



moE 



l{F«(|i?o|/m) >7ro-i} 



l^ol 



< moE 



< 



l{\R,\/m>F-\-K^')) 
l^ol 

m/m>F-\',^^)\ . 



Now, by assumption, the FDR of the first step Rq is controlled at level ttoQio , so that 

\Ronno\ 



(12) 



> E 



1^0, 

\Ro \ + mo - m 



l{|i?o| >0} 



l{\Ro\>0} 



E[[l + (7ro-l)Z-i]l{Z>0}] 



where we denoted by Z the random variable \Ro\/m. Hence by Markov's inequality, for all t > 

1 - TTO, 



¥[Z>t] <¥\^[l + {no~l)Z-^]l{Z >0}> 1 (ttq - 

choosing t = F~^{'Kq^) and using this into (12), we obtain 

'l{F^{\Ro\/m)>7Vo^} 



< 



1 -f (tto - l)t-i ' 



moE 



If we want this last quantity to be less than nao , this yields the condition F~^{'Kq^) > k~^'itq — 
TTo + 1 , and this is true from the expression of F~^ (note that this is how the formula for F^ was 
determined in the first place). ■ 



7 Probabilistic lemmas 

The three following lemmas have been established in a previous work (see Blanchard and Roquain, 
2008, Lemma 3.2). 

Lemma 7.1. Let g : [0,1] (0, 00) be a non-increasing function. Let U be a random variable 
which has a distribution stochastically lower bounded by a uniform distribution, that is, Vu G 
[0, 1], P(f7 <u) <u. Then, for any constant c > 0, we have 

^fl{U < cg(U)}\ 

Lemma 7.2. Let U, V be two non-negative real variables. Assume the following: 

1. The distribution of U is stochastically lower bounded by a uniform distribution, that is, Vm e 
[0,1], V{U <u)<u. 
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2. The conditional distribution of V given U < u is stoehastically decreasing in u, that is, 

Vw>0 VO<u<w', "^{V <v\U <u)<'^{V <v\U <u'). 
Then, for any constant c > 0, we have 

^ f\{U <cV}\ 

Lemma 7.3. Let U,V be two non-negative real variables and P be a function of the form, (4). 
Assume that the distribution of U is stochastically lower bounded by a uniform distribution, that 
is, Vu G [0, 1], F{U <u)<u. Then, for any constant c> 0, we have 

The following lemma was stated by Benjamini et al. (2006). It is a major point when we 
estimate Tr,^^ the independent case. The proof is left to the reader. 

Lemma 7.4. For any k > 2, q e]0, 1] , let Y be a binomial random variable with parameters 
{k — l,q); then the following holds: 

E[l/(1 + Y)] < 1/kq. 
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